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We consider settings where the observations are drawn from a zero-mean mul- 
tivariate (real or complex) normal distribution with the population covariance ma- 
trix having eigenvalues of arbitrary multiplicity. We assume that the eigenvectors of 
the population covariance matrix are unknown and focus on inferential procedures 
that are based on the sample eigenvalues alone (i.e., "eigen-inference"). 

Results found in the literature establish the asymptotic normality of the fluctu- 
ation in the trace of powers of the sample covariance matrix. We develop concrete 
algorithms for analytically computing the limiting quantities and the covariance 
of the fluctuations. We exploit the asymptotic normality of the trace of powers of 
the sample covariance matrix to develop eigenvalue-based procedures for testing 
and estimation. Specifically, we formulate a simple test of hypotheses for the pop- 
ulation eigenvalues and a technique for estimating the population eigenvalues in 
settings where the cumulative distribution function of the (nonrandom) population 
eigenvalues has a staircase structure. 

Monte Carlo simulations are used to demonstrate the superiority of the pro- 
posed methodologies over classical techniques and the robustness of the proposed 
techniques in high-dimensional, (relatively) small sample size settings. The im- 
proved performance results from the fact that the proposed inference procedures 
are "global" (in a sense that we describe) and exploit "global" information thereby 
overcoming the inherent biases that cripple classical inference procedures which are 
"local" and rely on "local" information. 
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1. Introduction. Let X = [xi,...,x n ] be a p x n data matrix where 
xi,...,x n denote n independent measurements, where for each i, Xj has 
a p-dimensional (real or complex) Gaussian distribution with mean zero, 
and positive definite covariance matrix S. When the samples are complex, 
the real and imaginary components are assumed to be independent, identi- 
cally distributed zero- mean Gaussian vectors with a covariance of 5]/2. The 
sample covariance matrix (SCM) when formed from these n samples as 

1 n 1 
(1.1) S:=-$>xJ = -XX', 

n ~ n 

i=i 

with ' denoting the conjugate transpose, has the (central) Wishart distri- 
bution [Wishart (1928)]. We focus on inference problems for parameterized 
covariance matrices modeled as T*q = UAqU' where 

a{L pl 

3 

where a\> ■ ■■> and J2j=iPj = P- Defining ij =Pi/p allows us to conve- 
niently express the 2k — 1-dimensional parameter vector as 9 = (t±, . . . , t^—i, 
oi, . . . , a*;) with the obvious nonnegativity constraints on the elements. 

Models of the form in (1.2) arise as a special case whenever the measure- 
ments are of the form 

(1.3) Xj = Asj + Zj for i = 1, . . . , n, 

where Zj ~A/^(0,S z ) denotes a p-dimensional (real or complex) Gaussian 
noise vector with covariance S a , s, ~A/jt(0,5j s ) denotes a A;-dimensional 
zero-mean (real or complex) Gaussian signal vector with covariance S s , and 
A is a p x k unknown nonrandom matrix. In array processing applications, 
the jth column of the matrix A encodes the parameter vector associated 
with the jth signal whose amplitude is described by the jth element of Sj. 
See, for example, the text by Van Trees (2002). 

Since the signal and noise vectors are independent of each other, the 
covariance matrix of Xj can be decomposed as 

(1.4) £ = * + 

where is the covariance of z and \l/ = AS S A'. One way of obtaining X 
with eigenvalues of the form in (1.2) is when S 2 = a 2 I so that the n — k 
smallest eigenvalues of I] are equal to a 2 . Then, if the matrix A is of full 
column rank and the covariance matrix of the signals 51 s is nonsingular, 
the p — k (with k < p here) smallest eigenvalues of ^ are equal to zero so 
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that the eigenvalues of X will be of the form in (1.2). Alternatively, if the 
eigenvalues of \1/ and S 2 have the identical subspace structure, that is, in 
(1.2), tf = tf~ for all i, then whenever the eigenvectors associated with each 
of the subspaces of \I/ and align, the eigenvalues of S will have the 
subspace structure in (1.2). 

Additionally, from an identifiability point of view, as shall be discussed 
in Section 7, if the practitioner has reason to believe that the population 
eigenvalues are organized in several clusters about a« ± ai^p/n, then the use 
of the model in (1.2) with a block subspace structure will also be justified. 

1.1. Inferring the population eigenvalues from the sample eigenvalues. 
While inference problems for these models have been documented in texts 
such as (Muirhead (1982)), the inadequacies of classical algorithms in high- 
dimensional, (relatively) small sample size settings have not been adequately 
addressed. We highlight some of the prevalent issues in the context of sta- 
tistical inference and hypothesis testing. 

In the landmark paper [Anderson (1963)], the theory was developed that 
describes the (large sample) asymptotics of the sample eigenvalues (in the 
real-valued case) for such models when the true covariance matrix has eigen- 
values of arbitrary multiplicity. Indeed, for arbitrary covariance X, the joint 
density function of the eigenvalues l\, . . . ,l p of the SCM S when n > p+ 1 is 
shown to be given by 

(1-5) Z^lf^^fllk-ljfJ ex P (-^IY(S- 1 VSV'))dV, 

i=l i<j Q V / 

where l\ > • • ■ > l p > 0, Z§ n is a normalization constant, and = 1 (or 
2) when S is real (resp., complex). In (1.5), Q £ 0(p) when (3 = 1 while 
Q € U(p) when (5 = 2 where 0(p) and U(p) are, respectively, the set of 
p x p orthogonal and unitary matrices with Haar measure. Anderson notes 
that 

If the characteristic roots of X are different, the deviations . . . from the corre- 
sponding population quantities are asymptotically normally distributed. When 
some of the roots of S are equal, the asymptotic distribution cannot be de- 
scribed so simply. 

Indeed, the difficulty alluded to S arises due to the presence of the integral 
over orthogonal (or unitary) group on the right-hand side of (1.5). This prob- 
lem is compounded in situations when some of the eigenvalues of S are equal 
as is the case for the model considered in (1.2). In such settings, large sam- 
ple approximations for this multivariate integral have been used [see, e.g., 
Muirhead (1982), page 403, Corollary 9.5.6, Butler and Wood (2002, 2005)]. 
For the problem of interest, Anderson uses just such an approximation to 
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derive the maximum-likelihood estimate of the population eigenvalues, ai, 
as 

(1.6) ai^—^Xj for l = l,...,k, 

where \j are the sample eigenvalues (arranged in descending order) and iVj 

is the set of integers p\ H \~Pi-i + 1, • • • ,Pi + • • • +Pi- This is a reasonable 

estimator that works well in practice when n^> p. The large sample size 
asymptotics are, however, of limited utility because they ignore the (sig- 
nificant) effect of the dimensionality of the system on the behavior of the 
sample eigenvalues. 

Consequently, (large sample size) asymptotic predictions, derived under 
the p fixed, n — > oo regime do not account for the additional complexities 
that arise in situations where the sample size n is large but the dimension- 
ality p is of comparable order. Furthermore, the estimators developed using 
the classical large sample asymptotics invariably become degenerate when- 
ever p > n, so that p — n of the sample eigenvalues will identically equal 
to zero. For example, when n = p/2, and there are two distinct population 
eigenvalues each with multiplicity p/2, then the estimate of the smallest 
eigenvalue using (1.6) will be zero. Other such scenarios where the popula- 
tion eigenvalue estimates obtained using (1.6) are meaningless are easy to 
construct and are practically relevant in many applications such as radar 
and sonar signal processing, and many more. See, for example, the text by 
Van Trees (2002) and the work of Smith (2005). 

There are, of course, other strategies one may employ for inferring the 
population eigenvalues. One might consider a maximum- likelihood technique 
based on maximizing the log-likelihood function of the observed data X 
which is given by (ignoring constants) 

J(X|E) := -n(TrSE -1 + logdetE), 

or, equivalently, when E = UAU', by minimizing the objective function 

(1.7) fc(X|U,A) = (TrSUA _1 U' + logdetA). 

What should be apparent on inspecting (1.7) is that the maximum-likelihood 
estimation of the parameters of A of the form in (1.2) requires us to model 
the population eigenvectors U as well (except when k = 1). If U were known 
a priori, then an estimate of ai obtained as 

(1.8) ai tts - (U'8U) jt j for l = l,...,k, 

Pl jeN t 

where Ni is the set of integers p\ + ■ ■ ■ +pi—i + 1, . . . ,pi + ■ ■ - + pi, will provide 
a good estimate. In practical applications, the population eigenvectors might 
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Fig. 1. The challenge of estimating the population eigenvalues from the sample eigenval- 
ues in high-dimensional settings, (a) Sample eigenvalues versus true eigenvalues (p = 80, 
n= 160). (b) Sample eigenvectors when U = I. (c) Diagonal elements of S when U = I. 
(d) Sample eigenvectors for arbitrary U. (e) Diagonal elements of S for arbitrary U. 
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either be unknown or be misspecified leading to faulty inference. Hence it is 
important to have the ability to perform statistically sound, computation- 
ally feasible eigen-inference of the population eigenvalues, that is, from the 
sample eigenvalues alone, in a manner that is robust to high-dimensionality 
and sample size constraints. 

We illustrate the difficulties encountered in high-dimensional settings with 
an example (summarized in Figure 1) of a SCM constructed from a covari- 
ance matrix modeled as S = UAU' with p = 80 and sample size n = 160. 
Half of the eigenvalues of A are equal to 2 while the remainder are equal 
to 1. The sample eigenvalues are significantly blurred, relative to the true 
eigenvalues as shown in Figure 1(a). Figure 1(b) and (d) plot the sample 
eigenvectors for the case when the true eigenvectors U = I, and an arbitrary 
U, respectively. Figure 1(c) and (e) plot the diagonal elements (S)jj. Thus, 
if the true eigenvector was indeed U = I, then an estimate of the population 
eigenvalues formed as in (1.8) yields a good estimate; when U/I, however, 
the estimate is very poor. 

1.2. Testing for equality of population eigenvalues. Similar difficulties 
are encountered in problems of testing as well. In such situations, for testing 
the hypothesis 

^PiH hpz-i+l = -VlH hPi-i+1,— ,piH hPr 

Anderson proposes the likelihood criterion given by 

/ \ Pk~\ n/2 



for I = 1, 



(1.9) V t 

'.jeNt ' \ jeN t 

where \j are the sample eigenvalues (arranged in descending order) and, 
as before, iVj is the set of integers p\ + ■ ■ ■ + pi-\ + 1, . . . ,pi + ■ • ■ + Pi- The 
test in (1.9) suffers from the same deficiency as the population eigenvalue 
estimator in (1.6) — it becomes degenerate when p> n. When the population 
eigenvectors U are known, (1.9) may be modified by forming the criterion 

n/2 

n^'su^/i^E^su),/ 

When the eigenvectors are misspecified, the inference provided will be faulty. 
For the earlier example, Figure 1(e) illustrates this for the case when it is 
assumed that the population eigenvectors are I when they are really U/I. 
Testing the hypothesis S = So reduces to testing the hypothesis S = I, given 

- ~ —1/2 

samples Xj for i = 1, . . . ,n, where x, = S Q Xj. The robustness of tests for 
sphericity in high-dimensional settings has been extensively discussed by 
Ledoit and Wolf (2002) and is the focus of some recent work by Srivastava 
(2005, 2006). 



for I = 1, . . . , k. 
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Table 1 
Structure of proposed algorithms 



Testing: Hs : h(G) :— Vg Q e 1 ve, Recommend dim(ve) = 2 

Estimation: 9 — argminjvg Q~ 1 ve + log dot Qe}, Recommend dim(vg) = dim(0) + 1 

96® 

Legend: (ve), = p x U Tr S 3 ' - of) , j = 1, . . . , q 

Qe = cov[v fl Vg] where (Qe)ij = af 3 



1.3. Proposed statistical eigen-inference techniques. In this article our fo- 
cus is on developing population eigenvalue estimation and testing algorithms 
for models of the form in (1.2) that are robust to high-dimensionality, sam- 
ple size constraints and population eigenvector misspecification in the spirit 
of the initial exploratory work in Rao and Edelman (2006). We are able to 
develop such computationally feasible algorithms by exploiting the proper- 
ties of the eigenvalues of large Wishart matrices. These results analytically 
describe the nonrandom blurring of the sample eigenvalues, relative to the 
population eigenvalues, in the p,n(p) — > oo limit while compensating for the 
random fluctuations about the limiting behavior due to finite-dimensionality 
effects. The initial work in Rao and Edelman (2006) only exploited the non- 
random blurring of the sample eigenvalues without accounting for the ran- 
dom fluctuations; this was equivalent to employing the estimation procedure 
in Table 1 with Qq = I. 

Taking into account the statistics of the fluctuations results in an im- 
proved performance and allows us to handle the situation where the sample 
eigenvalues are blurred to the point that the block subspace structure of 
the population eigenvalues cannot be visually discerned, as in Figure 1(a), 
thereby extending the "signal" detection capability beyond the special cases 
tackled in Silverstein and Combettes (1992). The nature of the mathematics 
being exploited makes them robust to the high-dimensionality and sample 
size constraints while the reliance on the sample eigenvalues alone makes 
them insensitive to any assumptions on the population eigenvectors. In such 
situations where the eigenvectors are accurately modeled, the practitioner 
may use the proposed methodologies to complement and "robustify" the 
inference provided by estimation and testing methodologies that exploit the 
eigenvector structure. 

We consider testing the hypothesis for the equality of the population 
eigenvalues and statistical inference about the population eigenvalues. In 
other words, for some unknown U, if So = UAgyU', where Ag is modeled as 
in (1.2), techniques to (1) test if £ = So, and (2) estimate 6q are summarized 
in Table 1. We note that inference on the population eigenvalues is performed 
using the entire sample eig en- spectrum unlike (1.6) and (1.9). This reflects 
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Table 2 

Comparison of performance of different techniques for estimating the nonunity 
population eigenvalue in Figure 1 when the block structure is known a priori 







Known U 


Known U 




Unknown U 






p 


n 


Max Like. 


Max Like.Xp 


2 Anderson 


Max Like. 


SEI 


SEIxp 2 










(a) Bias 








10 


20 


0.0117 


0.1168 


-1.9994 


-0.5811 


-0.0331 


-0.3308 


20 


40 





0.0001 


-1.9994 


-0.5159 


-0.0112 


-0.2244 


40 


80 


0.0008 


0.0301 


-1.9994 


-0.5245 


-0.0019 


-0.0776 


80 


160 


-0.0003 


-0.0259 


-1.9994 


-0.4894 


-0.0003 


-0.0221 


160 


320 


0.0000 


0.0035 


-1.9994 


-0.4916 


-0.0003 


-0.0411 


320 


640 


0.0001 


0.0426 


-1.9994 


-0.5015 


0.0001 


0.0179 










(b) MSE 








10 


20 


0.0380 


3.7976 


3.9990 


0.3595 


0.0495 


4.9463 


20 


40 


0.0100 


3.9908 


3.9990 


0.2722 


0.0126 


5.0256 


40 


80 


0.0025 


3.9256 


3.9991 


0.2765 


0.0030 


4.8483 


SO 


160 


0.0006 


4.1118 


3.9991 


0.2399 


0.0008 


5.1794 


160 


320 


0.0002 


4.1022 


3.9990 


0.2417 


0.0002 


5.0480 


320 


640 


0.0000 


4.0104 


3.9990 


0.2515 


0.0000 


5.0210 



the inherent nonlinearities of the sample eigenvalue blurring induced by 
high-dimensionality and sample size constraints. 

Table 2 compares the bias and mean square error of various techniques 
of estimating the nonunity population eigenvalue in Figure 1 (the SCM is 
complex- valued) when the block structure is known a priori, that is, when 
t\=t2 = 0.5, and ai = 1 are known and a := a\ is unknown and to be es- 
timated. The first two columns refer to the procedure in (1.8) where the 
correct population eigenvectors U/I are used, the third column refers to 
Anderson's procedure in (1.6) while the fourth column refers to the pro- 
cedure in (1.8) where U = I is used instead of the population eigenvec- 
tors. The last two columns refer to the proposed statistical eigen-inference 
(SEI) technique in Table 1 with 6 := a, v(6) = Tr S - p(0.5a + 0.5), and 
Qg = {I /2a 2 + l/2a 2 c + ac + 1/2 + l/2c - a)c 2 where c = p/n. Note that 
though the SEI techniques do not exploit any eigenvector information, their 
performance compares favorably to the maximum-likelihood technique that 
does. As for the other techniques, it is evident that the inherent finite sample 
biases in the problem cripple the estimators derived on the basis of classical 
large sample asymptotics. 

An important implication of this in practice is that in high-dimensional, 
sample size starved settings, local inference, performed on a subset of sample 
eigenvalues alone, that fails to take into account the global structure (i.e., by 
modeling the remaining eigenvalues) is likely to be inaccurate, or worse mis- 
leading. In such settings, practitioners are advised to consider tests (such as 
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the ones proposed) for the equality of the entire population eigen-spectrum 
instead of testing for the equality of individual population eigenvalues. 

We view the inference techniques developed herein as the first step in the 
development of improved high-dimensional covariance matrix estimation al- 
gorithms. The issue of inverse covariance matrix estimation which Srivastava 
(2007) examines in the context of discriminant analysis is also related. 

The approach we have in mind differs from the (sample eigenvalue) shrinkage- 
based techniques in Haff (1980), Dey and Srinivasan (1985) in a crucial re- 
gard. Our perspective is that the eigenvalues and the eigenvectors (or sub- 
spaces) of the sample covariance matrices are blurred relative to the popula- 
tion eigenvalues and eigenvectors (or subspaces), respectively. For the model 
considered in this article, the precise analytical characterization of the blur- 
ring of the eigenvalues (Theorem 2.7) allows us to formulate and solve the 
deblurring problem. The tools from free probability are applied in the first 
author's dissertation [see Nadakuditi (2007)] to precisely describe the blur- 
ring of the population eigenvectors (or subspaces) as well. The answer is 
encoded in the form of a conditional eigenvector "distribution" that explic- 
itly takes into account the dimensionality of the system and the sample size 
available — the conditioning is with respect to the population eigenvalues. 
The idea that the covariance matrix estimate thus constructed from the 
deblurred eigenvalues and eigenvectors should be significantly better has 
merit. The development of computationally realizable eigenvector deblur- 
ring algorithms is a significant obstacle to progress along this direction of 
research. 

1.4. Related work. There are other alternatives found in the literature 
to the block subspace hypothesis testing problem considered in this arti- 
cle. El Karoui (2007) provides a test for the largest eigenvalue for a large 
class of complex Wishart matrices including those with a population co- 
variance matrix of the form in (1.2). Though the results are stated for the 
case when p < n, simulations confirm the validity of the techniques to the 
alternative case when p>n and for real Wishart matrices. El Karoui's tests 
can be classified as a local test that utilizes global information, that is, in- 
formation about the entire (assumed) population eigen-spectrum. Testing 
is performed by computing the largest eigenvalue of the sample covariance 
matrix, recentering, rescaling it and rejecting the hypothesis if it is too large. 
The recentering and rescaling parameters are determined by the a% and tj 
values in (1.2) while the threshold is determined by the quantiles of the ap- 
propriate (real or complex) Tracy- Widom distribution [Tracy and Widom 
(1994, 1996), Johnstone (2001)]. A disadvantage of this procedure is the 
great likelihood that recentering by the false parameter pushes the test 
statistic toward the left tail of the distribution. Consequently, the iden- 
tity covariance hypothesis will be accepted with great likelihood whenever 
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the recentering and rescaling coefficients are calculated for the model in 
(1.2) with a,i > 1. The proposed global test based on global information 
avoids this pitfall and is based on distributional results for the traces of 
powers of Wishart matrices that also appear in Srivastava (2005). The is- 
sue of whether a local test or a global test is more powerful is important 
and highlighted using simulations in the context of a joint estimation and 
testing problem in Section 7; its full resolution is beyond the scope of this 
article. 

Silverstein and Combettes (1992) consider the situation when the sam- 
ple eigenvalues discernibly split into distinct clusters and suggest that the 
proportion of the eigenvalues in each cluster will provide a good estimate 
of the parameters aj in (1.2). The nature of the distributional results in 
Bai and Silverstein (1998) imply that whenever the sample eigenvalues are 
thus clustered, then for large enough p, the estimate of aj thus obtained 
will be exactly equal to true value. Such a procedure could not, however, 
be applied for situations such as those depicted in Figure 1(a) where the 
sample eigenvalues do not separate into clusters. Silverstein and Combettes 
(1992) do not provide a strategy for computing the tj in (1.2) once the is 
computed — the proposed techniques fill the void. 

A semiparametric, grid-based technique for inferring the empirical dis- 
tribution function of the population eigenvalues from the sample eigen- 
spectrum was proposed by El Karoui (2006). The procedure described can 
be invaluable to the practitioner in the initial data exploration stage by 
providing a good estimate of the number of blocks in (1.2) and a less re- 
fined estimate of the underlying and £j associated with each block. Our 
techniques can then be used to improve or test the estimates. 

1.5. Outline. The rest of this article is organized as follows. In Section 2 
we introduce the necessary definitions and summarize the relevant theorem. 
Concrete algorithms for computing the analytic expectations that appear 
in the algorithms summarized in Table 1 are presented in Section 3. The 
eigen- inference techniques are developed in Section 4. The performance of 
the algorithms is illustrated using Monte Carlo simulations in Section 5. 
Some concluding remarks are presented in Section 8. 

2. Preliminaries. 

Definition 2.1. Let A = An be an N x N matrix with real eigenvalues. 
The jth sample moment is defined as 

tr(A^):=lTr(A^), 

where Tr is the usual unnormalized trace. 
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Definition 2.2. Let A = A at be a sequence of self-adjoint N x N ran- 
dom matrices. If the limit of all moments denned as 

af=: lim E[tr(At)] (N G N) 
exists, then we say that A has a limit eigenvalue distribution. 

Notation 2.3. For a random matrix A with a limit eigenvalue dis- 
tribution we denote by Ma(x) the moment power series, which we define 
by 

M A (x) :=l + ^2afx j . 

Notation 2.4. For a random matrix ensemble A with limit eigenvalue 
distribution we denote by gA(%) the corresponding Cauchy transform, which 
we define as formal power series by 



9a{x) := lim E 

N— too 



^Tr(xl7v- A^)- 1 



-M A (l/x). 

x 



Definition 2.5. Let A := An be an N x N self-adjoint random matrix 
ensemble. We say that it has a second-order limit distribution if for all i, j G N 
the limits 



and 



exist and if 



af := lim fci(tr(A-L)) 

N— >oo 



<•:= lim k 2 (Ti(A N ),TT(A j N )) 

N—too 



hm A; r (TV(A^ 1) ),...,Tr(Af ) )) = 



N— >oo 

for all r > 3 and all j(l), . . . , j(r) G N. In this definition, we denote the 
(classical) cumulants by k n . Note that k\ is just the expectation, and k 2 the 
covariance. 

Notation 2.6. When A = Ajv has a limit eigenvalue distribution, then 
the limits af := limjv-»ooE[tr(Aj v -)] exist. When Ajy has a second-order 
limit distribution, the fluctuation 

tr(A^) - af 

is asymptotically Gaussian of order 1/N. We consider the second-order co- 
variances defined as 

af r .= lim cov(Tr(AV),Tr(A^)), 
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and denote by Ma( x ,u) the second-order moment power series, which we 
define by 

M A (x,y) ■- a tj xi y j - 



Theorem 2.7. Assume that the p x p (nonrandom) covariance matrix 
X = (Sp)p g pj has a limit eigenvalue distribution. Let S be the (real or com- 
plex) sample covariance matrix formed from the n samples as in (1.1). Then 
for p,n — > oo with p/n — > c € (0,oo), S has both a limit eigenvalue distribu- 
tion and a second-order limit distribution. The Cauchy transform of the limit 
eigenvalue distribution, g(x) = gs(x), satisfies the equation 

(2-1) g(x) = - t r5E 



1 — c + cxg(x) \l — c + cxg(x) 

with the corresponding power series M${x) = l/xgs(l/x). Define S = ^X'X 
so that its moment power series is given by 

(2.2) M~(y) = c(M s (z)-l) + l. 

The second- order moment generating series is given by 

(2.3a) M S (x,y) = Mg(x,y) = ^Mf(x,y), 



where 

(2.3b) Mf(x,y) = xy 



jjxM^x)) • jfoMg(y)) l 



(xM~(x) - yM~{y)f {x-yf 
where (3 equals 1 (or 2) when the elements of S are real (or complex). 



Proof. Theorem 2.7 is due to Bai and Silverstein. They stated and 
proved it in Bai and Silverstein (2004) by complex analysis tools. (Note, 
however, that there is a missing factor 2 in their formula (2.3a) that has 
been corrected in their book [Bai and Silverstein (2006), page 251, Lemma 
9.11, (9.8.4)].) □ 

Our equivalent formulation in terms of formal power series can, for the 
case (5 = 2, also be derived quite canonically by using the theory of second- 
order freeness. Let us also mention that the proof using second-order free- 
ness extends easily to the situation where XI is itself a random matrix with 
a second-order limit distribution. If we denote by My, the corresponding 
second-order moment power series of E, as in Notation 2.6, then the theory 
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of second-order freeness gives (for (5 = 2) the following extension of formula 
(2.3b): 

*?(,,,) - M E (*M sW ,„M §W ) ■ . i<« 

(2.4) S 

'4(xM s (x))-i(yM s (y)) 1 



(xMg(x)-yM § (y)) 2 (z-y) 5 



Whereas from an analytic point of view, formulas (2.1) and (2.4) might 
look quite mysterious, from the perspective of free probability theory there is 
an easy conceptual way of looking on them. Namely, they are just rewritings 
into formal power series of the following fact: the matrix S has a compound 
free Poisson distribution, for both its moments and its fluctuations. This 
means that the free cumulants of S of first and second-order are, up to scal- 
ing, given by the moments and the fluctuations, respectively, of (This 
should be compared to: a classical compound Poisson distribution is charac- 
terized by the fact that its classical cumulants are a multiple of the moments 
of the corresponding "jump distribution.") In the case where 5] is nonran- 
dom the fluctuations of H are clearly zero (and thus the second-order free 
cumulants of S vanish), that is, Mt, = 0, resulting in the special case (2.3b) 
of formula (2.4). 

For the definition of free cumulants and more information on second- 
order freeness, the interested reader should consult Collins et al. (2007), in 
particular, Section 2. 

3. Computational aspects. 

Proposition 3.1. ForUg = UAgU' as in (1.2), let 9 = {t\,.. .,tk-i,ai, 
. . . , dfc), where U = Pi/p. Then S has a limit eigenvalue distribution as well as 
a second- order limit distribution. The moments otj, and hence ctfj, depend 
on and c. Let vg be a q-by-1 vector whose jth element is given by 

(■v )j = Tr&-pa$. 

Then for large p and n, 

(3.1) v e ~AA(/x ,Q ), 
where \x$ = if S is complex and (Qe)i,j = otf ~. 

Proof. This follows directly from Theorem 2.7. From (3.2) and (3.4), 
the moments af depend on a s and c = p/n and hence on the unknown 
parameter vector 6. The existence of the nonzero mean when S is real follows 
from the statement in Bai and Silverstein (2004). □ 
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3.1. Computation of moments of limiting eigenvalue distribution. A method 
of enumerating the moments of the limiting eigenvalue distribution is to use 
the software package RMTool [Rao (2006)] based on the polynomial method 
developed in the second part of the first author's dissertation [Nadakuditi 
(2007)]. The software enables the moments of S to be enumerated rapidly 
whenever the moment power series of S is an algebraic power series, that 
is, it is the solution of an algebraic equation. This is always the case when 
S is of the form in (1.2). For example, if 6 = (ti,t2,ai,a2,as), then we can 
obtain the moments of S by typing in the following sequence of commands 
in Matlab once RMTool has been installed. This eliminates the need to 
obtain manually the expressions for the moments a priori: 

» startRMTool 

» syms c tl t2 al a2 a3 

>> number_of _moments = 5; 

» LmzSigma = atomLmz( [al a2 a3] , [tl t2 l-(tl+t2)]); 

>> LmzS = AtimesWish(LmzSigma, c) ; 

>> alpha_S = Lmz2MomF(LmzS ,number_of _moments) ; 

>> alpha_Stilde = c*alpha_S; 

An alternate and versatile method of computing the moments relies on 
exploiting (2.1) which expresses the relationship between the moment power 
series of S and that of S via the limit of the ratio p/n. This allows us 
to directly express the expected moments of S in terms of the moments 
of S. The general form of the moments of S, given by Corollary 9.12 in 
Nica and Speicher [(2006), page 143] is 



(3.2) af = ]T c^+^-^iafria^r ■ • ■ • 7? 



I2,...,lj' 



l«i+2i2+3i3H hjij=j 

where 7^ i . is the multinomial coefficient given by 
(3-3) 7? 



■12- 



ii!i 2 ! • • • ijKJ + 1 - (*i + *2 + ■ • ■ + ' 

The multinomial coefficient in (3.3) has an interesting combinatorial in- 
terpretation. Let j be a positive integer, and let i±, . . . ,ij G NU {0} be such 

that i\ +2i2 H \-jij = j- The number of noncrossing partitions tt S NC{j) 

which have %\ blocks with 1 element, %2 blocks with 2 elements, . . . , ij blocks 
with j elements is given by the multinomial coefficient 7^ i . . 

The moments of S are related to the moments of S as 
(3.4) a J = ca j for j = 1,2,.... 

We can use (3.2) to compute the first few moments of S in terms of the 
moments of XI. This involves enumerating the partitions that appear in the 
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computation of the multinomial coefficient in (3.3). For j = 1 only i\ = 1 
contributes with 7^ = 1, thus, 

(3.5) af = caf. 

For n = 2 only i\ = 2, 12 = and i\ = 0, «2 = 1 contribute with 

^( 2 ) - 1 ~( 2 ) - 1 

'2,0 ~~ i ' '0,1 — 1 

and thus 

(3.6) af = raf + c 2 (af) 2 . 

For n = 3 we have three possibilities for the indices, contributing with 

-v (3) -1 -v (3) -3 -v (3) -1 

'3,0,0 — l i '1,1,0 — °) '0,0,1 — 

thus 

(3.7) of = caf +3c 2 afa| : + c 3 (af) 3 . 

For n = 4 we have five possibilities for the indices, contributing with 

-v (4) -1 -v (4) -fi -v (4) -2 -v (4) -4 -y (4) -1 

'4,0,0,0 — x ' '2,1,0,0 — D ' '0,2,0,0 — z ' 71,0,1,0 — ^1 7o,0,0,l _ L > 

thus 

(3.8) af = caf + 4c 2 af af + 2c 2 (a| : ) 2 + 6c 3 (af ) 2 af + c 4 (af ) 4 . 

For specific instances of X, we simply plug the moments af into the above 
expressions to get the corresponding moments of S. The general formula in 
(3.2) can be used to generate the expressions for higher-order moments. 
We can efficiently enumerate the sum-constrained partitions that appear in 
(3.2) by employing the algorithm that recursively computes the nonnegative 
integer sequences s(k) of length j with the sum constraint YjL=i s (^) ^ = i 
listed below 

Input: Integer j 

Output: Nonnegative integer sequences s(k) of length j satisfying con- 
straint J2k = i s (k) x k]=n 

If 7 = 1 

The only sequence of length 1 is s = j 
else 

for k = to 1 

Compute sequences of length j — 1 for j — k x j 
Append k to each sequence above and include in output 
end 
end 
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3.2. Computation of covariance moments of second-order limit distribu- 
tion. Equations (2.3a) and (2.3b) express the relationship between the co- 
variance of the second-order limit distribution and the moments of S. Let 
M{x) denote a moment power series as in Notation 2.3 with coefficients ctj. 
Define the power series H{x) = xM{x) and let 



(3.9) H(x,y):-- 



{H(x)-H(y)Y (x-yf 



so that M.°°(x, y) := xyTC(x, y). The (i, j)th coefficient of M.°°(x, y) can then 
be extracted from a multivariate Taylor series expansion of 7i(x,y) about 
x = 0, y = 0. From (2.3a), we then obtain the coefficients afj = (2//3)afj a ° . 
The coefficients afj can be readily enumerated by invoking a short sequence 
of commands in the Maple computer algebra system. For example, the code 
on the next page will enumerate af 2 - By modifying this code, we can obtain 

the coefficients afj in terms of on := af = dj /c for other choices of indices 
i and j and the constant max_coeff chosen such that i + j < 2max_coeff. 

> with(numapprox) : 

> max_coeff := 5: 

> H := x -> x*(l+sum(alpha[j]*x~2,j=l. . 2*max_coef f ) ) : 

> dHx : = diff (H(x) ,x) : dHy := dif f (H(y) ,y) : 

> H2 := simplify(dHx*dHy/(H(x)-H(y))~2-l/(x-y)~2: 

> H2series := mtaylor(H2, [x,y] ,2*max_coef f ) : 

> i:=5: j =2: 

> M2_infty_coef f [i, j] 

: = simplif y (coef f (coef f (H2series ,x, i-1) ,y, j-1) ) : 

> alphaS_second[i, j] := (2/beta) *M2_inf ty_coef f [i , j] : 

Table 3 lists some of the coefficients of A4°° obtained using this procedure. 
When aj = 1 for all j £ N, then atij = as expected, since aj = 1 denotes the 
identity matrix. Note that the moments a±, . . . , an+j are needed to compute 
the second-order covariance moments ai j = otj.i. 

The covariance matrix Q with elements Qjj = ct^j gets increasingly ill- 
conditioned as dim(Q) increases; the growth in the magnitude of the diago- 
nal entries ctjj in Table 3 attests to this. This implies that the eigenvectors 
of Q encode the information about the covariance of the second-order limit 
distribution more efficiently than the matrix Q itself. When S = I so that 
the SCM S has the (null) Wishart distribution, the eigenvectors of Q are 
the (appropriately normalized) Chebyshev polynomials of the second kind 
[Mingo and Speicher (2006)]. The structure of the eigenvectors for arbitrary 
I] is, as yet, unknown though research in that direction might yield addi- 
tional insights. 
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Table 3 

Relationship between the coefficients au,j = ctj,i and Ui 



Coefficient Expression 

2 

ai,i a.2 — cti 

a 2 ,i — 4aiQ2 + 2a? + 2a :j 

02,2 16a?a2 — 6a| — 6a? — 8aiQ3 + 4a4 

a3,i 9a?a2 — 60103 — 3a| + 3a4 — 3a? 

«3,2 6Q5 + 30aiQ2 — 42a?a2 — 180203 + 12a? + 24a?a3 — 12aia4 

£* 3 ,3 -18a§ - 27a 2 a 4 + 9a 6 - 30a? + 21a! + 36a?a 4 - 72a?a 3 + 126a?a 2 - 

135a?a! + 108ai Q2Q3 — 18aia5 

«4,i 12ai«2 — 16afa2 — 8020:3 + 12a? 03 — 80404 + 4a? + 4as 

a 4 ,2 -12a§ - 24o 2 o 4 + 8o 6 - 20a? + 16a| + 32a? a 4 - 56a?a 3 + 88a? a 2 - 

960i02 + 8OQ1O2O3 — 16oios 
a 4 ,3 960203 + 60a? + 84oiol + 432a?a! + 180a?a 3 - 48o 3 o 4 + 12a 7 - 36a 2 a 5 - 

24ai06 + 144oiQ204 +48oiOs — 96a? 04 — 15601a! — 300ofo2 — 396a?a2a3 
e*4,4 — 140of — 76a! — 48o6Q2 + 256Q3O4O1 — 40a;f + 16ag — 640305 — 32oiQ7 + 

1408a?a 2 a 3 - 336a?a| + 256a?a 4 + 144a!a 4 - 480a?a 3 + 160a 2 a| + 

64a?a 6 - 128a?a 5 - 1440a?a! + 832a?a! + 800a?a 2 - 768aia|a 3 - 

576aiCt2Ct4 + 192aia2Q5 

a 5 ,i -5a| - 10a 2 a 4 + 5a 6 - 5a? + 5a! + 15a?a 4 - 20a? a 3 + 25a?a 2 - 30a? a! + 

30aia2a3 — lOaias 

a 5 , 2 60a!a 3 +30a? + 50aia| + 240a? a! + HOaf a 3 - 30a 3 a 4 + 10a 7 - 30a 2 a 5 - 

20aia 6 + 100aia 2 a 4 + 40a?a 5 - 70a? a 4 - 90aia! - 160a?a 2 - 240a? a 2 a 3 

c*5,3 —105a? — 60a| — 45a6a2 + 210030404 — 30a;! + 15ag — 600305 — 30aia7 + 

1140a?a 2 a 3 - 270a?a§ + 225a?a 4 + 120ala 4 - 390a?a 3 + 135a 2 a§ + 
60a?a 6 - 120a?a 5 - 1125a?a! + 660a?a! + 615a?a 2 - 630aia|a 3 - 
495a?a2a4 + 180aiO2a 5 

as, 4 -900a?a 4 a 3 + 80a?a 7 - 160a?a 6 - 620a?a 4 - 3200a?a| + 700aiai + 

3960a?a! - 720a?a 5 a 2 + 1840a?a 4 a 2 - 4100a?a 3 a 2 + 3600a?a!a 3 - 
1140aia|a 2 + 1040a?a| - 440a|a 3 + 440a 3 a 4 a 2 + 240aia 6 a 2 + 
320aia 5 a 3 - 1020aia|a 4 + 20a 9 - 1820a?a 2 + 180ala 5 + 320a?a 5 + 
180ai al + 1 120a?a 3 + 80a| + 280a? - 40ai a 8 - 60a 7 a 2 - 80a 3 a 6 - 100a 4 a 5 

as, 5 2400a2O5a? — 1350a!asai + 600a3a 5 a2 + 300aia 7 a2 — 900aea2a? — 

1200a 3 a 5 a? + 400aia 6 a 3 + 3000a 3 a 4 a? + 5100a?a!a 4 + 12300a? a 2 a 3 + 
5700a?a 2 a| + 4400aia!a 3 + 400a?a 6 - 15000a?a|a 3 - 5750a?a 2 a 4 - 
200a?a 7 + 500aia 4 a 5 + 225a 6 a| - 675ala? - 3250a?a§ - 625a!a 4 + 
350ala 4 - 600aial - 1050a!a| -2800a 3 a? - 11550a?a! -3300a 3 a 4 aia 2 - 
800a 5 a? + 325ala 2 - 4375a?a| - 630a?° + 100a 8 a? - 75a| + 255a! + 
12000a?a! + 4550a?a 2 + 1550a?a 4 + 25aio - 50aia 9 - 75a 2 a 8 - 100a 3 a 7 - 
125a4ae 
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4. Eigen-inference algorithms. 

4.1. Estimating 9 for known model order. Estimating the unknown pa- 
rameter vector 6 follows from the asymptotic result in Proposition 3.1. For 
large p, n, since vg is (approximately) normally distributed we can obtain 
the estimate 6 by the principle of maximum-likelihood. When S is real, Bai 
and Silverstein provide a formula, expressed as a difficult-to-compute con- 
tour integral, for the correction term fig in (3.1). The log-likelihood of vg 
is (ignoring constants and the correction term for the mean when S is real) 
given by 

(4.1) ^Vfllfl^-v^Q^ve-logdetQe, 

which allows us to obtain the maximum-likelihood estimate of as 

(4.2) Br q \ = argmin Q« 1 vg + log det Qg for q = dim(ve) > dim(0), 

060 

where represents the parameter space for the elements of 6 and vg and 
Qg are constructed as in Proposition 3.1. 

4.2. Guidelines for picking q := dim(v^) . Canonically, the parameter vec- 
tor of models such as (1.2) is of length 2k — 1 so that q = dim(ve) must 
equal or exceed 2k — 1. In principle, estimation accuracy should increase 
with q since the covariance of vg is explicitly accounted for via the weight- 
ing matrix Qg. 

Figure 2 compares the quantiles of the test statistic v'gQ^vg for dim(ve) = 
q with the quantiles of the chi-squared distribution with q degrees of freedom 
when q = 2, 3 for the model in (1.2) with = (0.5, 2, 1), n = p and p = 40 and 
p = 320. While there is good agreement with the theoretical distribution for 
large n,p, the deviation from the limiting result is not insignificant for mod- 
erate n,p. This justifies setting q = 2 for the testing procedures developed 
herein. 

In the most general estimation setting as in (4.2) where includes the 
smallest population eigenvalue of Tig we have found that q := dim(ve) must 
be no smaller than dim(0) + 1. When the smallest eigenvalue of ~Sg is known, 
however, q can be as small as dim(0). Within these guidelines, picking a 
smaller value of q provides robustness in low-to moderate-dimensional set- 
tings where the deviations from the asymptotic result in Theorem 2.7 are not 
insignificant. Numerical simulations suggest that the resulting degradation 
in estimation accuracy in high-dimensional settings, in using the smallest 
suggested choice for q instead of a higher value, is relatively small. This 
loss in performance is offset by an increase in the speed of the underlying 
numerical optimization routine. This is the case because, though the dimen- 
sionality of is the same, the matrix Q gets increasingly ill-conditioned 
for higher values of q, thereby reducing the efficiency of optimization meth- 
ods. 
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* S ID li » IS * 5 10 1S » K 

Standard Cfii-Squsre QuantlUs (3 dsgnsea of 'rseHom> Standard tftl-Squaro QusnliJofl (3 (fogpaes of lra«dDn» 

(c) (d) 

Fig. 2. Numerical simulations (when S is complex) illustrating the robustness of the 
distribution approximation for the test statistic in Table 1 formed with dim(v) — 2 to 
moderate- dimensional settings, (a) p = n = 40: dim(v) = 2. (b) p = n — 320/ dim(v) = 2. 
(c) p = n = 40: dim(v) =3. (d) p = n = 320: dim(v) = 3. 
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4.3. Testing 8 = 9q. 



Proposition 4.1. Define the vector v# and the covariance matrix Qg 



as 



(4.3a) v 



TrS^ 



TrS -paf 

«f + -(«?) 2 
n 







la 



n 



(4.3b) Q = - 



a 2 — a\ 2a\ + 2a 3 — Aa 1 a 2 

2a\ + 2a 3 — 4S 1 S 2 4a 4 — 85i5 3 — + 165 2 S^ — 65^ 



with fi = \ (or 2) when S is real (or complex) and ai = af given by 



(4.4a) qi = -af, 
n 

(4.4b) 5 2 = % s + 4(af) 2 , 
n n z 

2 3 

(4.4c) q 3 = -of + 3^af + ^(af ) 3 , 

2 2 3 4 

~ P £ i aP £ £ i r,P I £\2 , C P 1 S\2 £ i P i £\4 

(4.4d) a 4 = -a 4 + 4^a x a 3 + 2^{a 2 ) + 6— , (a x ) a 2 + ~ rC^i ) > 
n ra z n z n 13 



and af = (l/p)Tr5] 1 = X^=i a j*j- Thus, for large p and n, vg ~ AA(0,Qg) 
so £/iai 



(4.5) W:=vjQe 1 ve~x|. 

Proof. This follows from Proposition 3.1. The correction term for the 
real case is discussed in a different context in Dumitriu, Edelman and Shuman 
(2007). A matrix-theoretic derivation in the real case {[3 = 1) can be found 
in Srivastava (2005), Corollary 2.1, page 3. □ 



We test for 9 = 9q by obtaining the test statistic 
(4-6) Hg o :h(9 o )=vlQ e ^ 0o , 

where the vg and Qg are constructed as in (4.3a) and (4.3b), respectively. 
We reject the hypothesis for large values of Hg . For a choice of threshold 
7, the asymptotic convergence of the test statistic to the X2 distribution 
implies that 

(4.7) Prob.(tf 0() = l|0 = o )«F*i(7). 

Thus, for large p and n, when 7 = 5.9914, Prob. (Hg = 1\9 = 9q) ~ 0.95. 
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4.4. Estimating 9 and testing the estimate. When a is obtained using 
(4.2) then we may test for 9 = by forming the testing statistic 

(4.8) ^:/ i (0)=u|W^ 1 u ? , 

where the u~ and are constructed as in (4.3a) and (4.3b), respectively. 
However, the sample covariance matrix S can no longer be used since the 
estimate 6 was obtained from it. Instead, we form a test sample covariance 
matrix constructed from [(n/2)] randomly chosen samples. Equivalently, 
since the samples are assumed to be mutually independent and identically 
distributed, we can form the test matrix from the first [(n/2)] samples as 

, rn/21 

(4.9) 5 ^i>' x -' 

Note that af will have to be recomputed using £~ and c = p/\(n/2)~\. The 

hypothesis 6 = is tested by rejecting values of the test statistic greater than 
a threshold 7. The threshold is selected using the approximation in (4.7). 
Alternately, the hypothesis can be rejected if the recentered and rescaled 
largest eigenvalue of S is greater than the threshold 7. The threshold is 
selected using the quantiles of the (real or complex) Tracy- Widom distribu- 
tion. The recentering and rescaling coefficients are obtained by the procedure 
described in El Karoui (2007). 

4.5. Estimating 9 for unknown model order. Suppose we have a family 
of models parameterized by the vector 6^ k \ The elements of 0^ are the 
free parameters of the model. For the model in (1.2), in the canonical case 

6 = (t\, . . . , t&_i, a%, . . . , afc) since t±-\ + tk_i + tf. = 1 so that dim(6^) = 

2k — 1. If some of the parameters in (1.2) are known, then the parameter 
vector is modified accordingly. 

When the model order is unknown, we select the model which has the 
minimum Akaike Information Criterion. For the situation at hand we pro- 
pose that 

= 0® 

( 4 -!0) 

where k = argmm{u^ fe ) W^Liu^) + logdet W^ (fc) } + 2dim(6^), 
fceN e & Q 

where u~(*) and W~(fc) are constructed as described in Section 4.4 using the 


test sample covariance matrix in (4.9). Alternately, a sequence of nested hy- 
pothesis tests using a largest eigenvalue based test as described in El Karoui 
(2007) can be used. It would be useful to compare the performance of the 
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proposed and the nested hypothesis testing procedures in situations of prac- 
tical interest. 

In the point of view adopted in this article, the sample eigen-spectrum 
is a single observation sampled from the multivariate probability distribu- 
tion in (1.5). Thus we did not consider a Bayesian Information Criterion 
(BIC) based formulation in (4.10) because of the resulting degeneracy of 
the conventional "log(Sample Size)" penalty term. In the context of model 
selection, the study of penalty function selection, including issues that arise 
due to dimensionality, remains an important topic whose full resolution is 
beyond the scope of this article. Nevertheless, we are able to demonstrate 
the robustness of the method proposed in (4.10) in some representative sit- 
uations in Section 6. 

5. Numerical simulations. Let be as in (1.2) with 6 = (ti, 01,02). 
When t\ = 0.5, a± = 2 and 02 = 1, then half of the population eigenvalues 
are equal to 2 while the remainder are of magnitude 1. Let the unknown 
parameter vector 6 = (i, a) where t = t\ and a = a\. Using the procedure 
described in Section 3.1, the first four moments can be obtained as (here 
c = p/n) 

(5.1a) of = l + t(o-l), 

(5.1b) of = (-2oc + a 2 c + c)t 2 + (-1 + 2ac - 2c + a 2 )t + 1 + c, 

of = (-3cV + a 3 c 2 - c 2 + 3ac 2 )t 3 
(5.1c) + (3c 2 + 3c 2 a 2 - 3ac - 6ac 2 - 3a 2 c + 3a 3 c + 3c)t 2 

+ (-3c 2 + a 3 - 1 - 6c + 3ac + 3a 2 c + 3ac 2 )t + 1 + c 2 + 3c, 
af = (6a 2 c 3 + a 4 c 3 - 4ac 3 - 4a 3 c 3 + c 3 )t 4 
+ (-6c 2 - 12a 3 c 2 + 12ac 3 

- 12a 2 c 3 + 4a 3 c 3 + 12ac 2 + 6a 4 c 2 - 4c 3 )t 3 
+ (-4a 2 c - 4ac - 12ac 3 - 24ac 2 + 6a 4 c 

+ 6a 2 c 3 + 12a 3 c 2 + 6c - 6c 2 a 2 + 6c 3 + 18c 2 - 4a 3 c)t 2 
+ (-4c 3 + 4ac + 6c 2 a 2 + 4ac 3 

- 1 + 12ac 2 - 18c 2 + 4a 2 c - 12c + 4a 3 c + a 4 )t 

+ l + c 3 + 6c + 6c 2 . 

From the discussion in Section 3.2, we obtain the covariance of the second- 
order limit distribution 

c 2 (af-a 2 ) c 3 (2(af) 3 + 2af -4afaf) 

c 3 (2(af ) 3 + 2af - 4afaf ) c 4 (4af - 8af af 

-6(af) 2 + 16af(af) 2 -6(af) 4 ) 



(5.1d) 



(5-2) Q fl = | 
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Table 4 

Quality of estimation oft — 0.5 for different values of p (dimension of observation vector) 
and n (number of samples) — both real and complex case for the example in Section 5 



Complex case Real case 

p n Bias MSE (MSE X p 2 )/100 Bias MSE (MSExp 2 )/100 



(a) n = 0.5p 



20 


10 


0.0455 


0.3658 


1.4632 




0.4862 


1.2479 


4.9915 


40 


20 


-0.0046 


0.1167 


1.8671 




0.2430 


0.3205 


5.1272 


SO 


40 


-0.0122 


0.0337 


2.1595 




0.1137 


0.08495 


5.437 


160 


80 


-0.0024 


0.0083 


2.1250 




0.0598 


0.02084 


5.335 


320 


160 


0.0008 


0.0021 


2.1790 
(b)n = 


-V 


0.0300 


0.00528 


5.406 


20 


20 


-0.0137 


0.1299 


0.5196 




0.2243 


0.3483 


1.3932 


40 


40 


-0.0052 


0.0390 


0.6233 




0.1083 


0.0901 


1.4412 


80 


80 


-0.0019 


0.0093 


0.5941 




0.0605 


0.0231 


1.4787 


160 


160 


-0.0005 


0.0024 


0.6127 




0.0303 


0.0055 


1.4106 


320 


320 


-0.0001 


0.0006 


0.6113 

(c) n = 




0.0162 


0.0015 


1.5155 


20 


40 


-0.0119 


0.0420 


0.1679 




0.1085 


0.1020 


0.4081 


40 


80 


-0.0017 


0.0109 


0.1740 




0.0563 


0.0255 


0.4079 


80 


160 


-0.0005 


0.0028 


0.1765 




0.0290 


0.0063 


0.4056 


160 


320 


-0.0004 


0.0007 


0.1828 




0.0151 


0.0016 


0.4139 


320 


640 


0.0001 


0.0002 


0.1752 




0.0080 


0.0004 


0.4024 



where (3 = 1 when S is real- valued and (3 = 2 when S is complex- valued. 

We then use (4.2) to estimate 6 and hence the unknown parameters t and 
a. Tables 4 and 5 compare the bias and mean squared error of the estimates 
for a and t, respectively. Note the 1/p 2 type decay in the mean squared 
error and how the real case has twice the variance as the complex case. 
As expected by the theory of maximum-likelihood estimation, the estimates 
become increasingly normal for large p and n. This is evident from Figure 3. 
As expected, the performance improves as the dimensionality of the system 
increases. 

6. Model order related issues. 

6.1. Robustness to model order overspecification. Consider the situation 
when the samples are complex- valued and the true covariance matrix J] = 
21. We erroneously assume that there are two blocks for the model in 
(1.2) and that ai = 1 is known while a := a\ and t := t\ are unknown and 
have to be estimated. We estimate 6 = (a,t) using (4.2) as before. The 
empirical cumulative distribution function (CDF) of t over 4000 Monte 
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Table 5 

Quality of estimation of a — 2 for different values of p (dimension of observation vector) 
and n (number of samples) — both real and complex case for the example in Section 5 



Complex case Real case 



p 


n 


Bias 


MSE 


(MSE x p 2 )/100 


Bias 


MSE 


(MSE x p 2 )/100 










(a) n = 0.5p 








20 


10 


0.1278 


0.1046 


0.4185 


0.00748 


0.1024 


0.4097 


40 


20 


0.0674 


0.0478 


0.7647 


-0.01835 


0.04993 


0.7989 


80 


40 


0.0238 


0.0111 


0.7116 


-0.02240 


0.01800 


1.1545 


160 


80 


0.0055 


0.0022 


0.5639 


-0.02146 


0.00414 


1.0563 


320 


160 


0.0007 


0.0005 


0.5418 


-0.01263 


0.00112 


1.1692 










(b) n=p 








20 


20 


0.0750 


0.0525 


0.2099 


-0.0019 


0.0577 


0.2307 


40 


40 


0.0227 


0.0127 


0.2028 


-0.0206 


0.0187 


0.2992 


80 


80 


0.0052 


0.0024 


0.1544 


-0.0206 


0.0047 


0.3007 


160 


160 


0.0014 


0.0006 


0.1499 


-0.0126 


0.0012 


0.3065 


320 


320 


0.0003 


0.0001 


0.1447 


-0.0074 


0.0003 


0.3407 










(c) n = 2p 








20 


40 


0.0251 


0.0134 


0.0534 


-0.0182 


0.0205 


0.0821 


40 


80 


0.0049 


0.0028 


0.0447 


-0.0175 


0.0052 


0.0834 


80 


160 


0.0015 


0.0007 


0.0428 


-0.0115 


0.0014 


0.0865 


160 


320 


0.0004 


0.0002 


0.0434 


-0.0067 


0.0004 


0.0920 


320 


640 


0.0000 


0.0000 


0.0412 


-0.0038 


0.0001 


0.0932 



Carlo trials shown in Figure 4(d) shows that t — ► 1 as p,n(p) — > oo. Fig- 
ure 4(c) compares the quantiles of test statistic in (4.5) with that of the 
chi-squared distribution with two degrees of freedom. The excellent agree- 
ment for modest values of p and n validates the distributional approxima- 
tion. Figure 4(a) and (b) plot the mean squared errors in estimating a and 
t, respectively. As before, the mean squared error exhibits a 1/p 2 behav- 
ior. Table 6 shows the 1/p decay in the bias of estimating these parame- 
ters. 

For this same example, the seventh and eighth columns of Table 6 show 
the level at which a sphericity and the 2 block hypothesis are accepted 
when the procedure described in (4.2) is applied and a threshold is set at 
the 5% significance level. The ninth and tenth columns of Table 6 show the 
acceptance rate for the 2 block hypothesis when the largest eigenvalue test 
proposed in El Karoui (2007) is applied on a test sample covariance matrix 
formed using first \n/2] samples and the original sample covariance matrix, 
respectively. The largest eigenvalue value test has an acceptance rate closer 
to the 5% significance level it was designed for. For all of the p and n values in 
Table 6, over the 4000 Monte Carlo trials, applying the procedure described 
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(d) 



Fig. 3. Normal probability plots of the estimates of a and t (true values: a — 2, t = 0.5,) 
for the example in Section 5. (a) a: p = 320, n — 640. (b) t: p = 320, p = 640. (c) a: 
p = 320, n = 640 (real-valued), (d) rj: p = 320, n = 640 (real-valued). 



in Section 4.5 produced the correct estimate A; = 1 for the order of the model 
in (1.2) when E = 21. 




Fig. 4. Performance of estimation algorithm when model order has been overspecified 
and S is complex. The population covariance matrix X = 21 which corresponds in (1.2) 
to ai — 1, and tj = 1 for arbitrary a 2 . We run the estimation algorithm assuming that 
a 2 — 1 and estimate a :— ai and t := ti in (1.2). (a) MSE: a. (b) MSE: i. (c) QQ plot: 
Test statistic in (4-5) for p — 320 = 2n. (d) Empirical CDF of i: n = p/2. 



STATISTICAL EIGEN-INFERENCE 



27 



Table 6 

Performance of estimation algorithm when model order has been overspecified and S is 

complex 







(i 




i 


i 


Sphericity 


2 Block 


x t t 

•"•max litJoli 


x t t 

"max itJoL 


p 


n 


Bias 


Bias X p 


Bias 


Bias X p acceptance acceptance 


(full) 


(half) 












(a) n - 


= v/2 










10 


5 


0.3523 


3.5232 


-0.1425 


— 1.4246 


0.9820 





9801 


1.0000 


0.9698 


20 


10 


0.1997 


3.9935 


-0.1157 


-2.3148 


0.9783 





9838 


0.9998 


0.9710 


40 


20 


0.1078 


4.3114 


-0.0783 


-3.1336 


0.9795 





9870 


0.9958 


0.9713 


80 


40 


0.0545 


4.3561 


-0.0463 


-3.7018 


0.9765 





9873 


0.9838 


0.9720 






0979 




— U.UZOl 


-4.0175 


0.9743 





9828 


u.y * do 




320 


160 


0.0141 


4.5261 


-0.0133 


-4.2580 


0.9805 





9885 


0.9753 


0.9675 












(b) n 


= P 










10 


10 


0.2087 


2.0867 


-0.1123 


-1.1225 


0.9793 


0. 


9768 


0.9998 


0.9675 


20 


20 


0.1050 


2.0991 


-0.0753 


-1.5060 


0.9773 





9845 


0.9965 


0.9723 


40 


40 


0.0558 


2.2312 


-0.0470 


-1.8807 


0.9850 





9898 


0.9898 


0.9743 


80 


80 


0.0283 


2.2611 


-0.0255 


-2.0410 


0.9813 





9868 


0.9773 


0.9710 


160 


160 


0.0137 


2.1990 


-0.0130 


-2.0811 


0.9805 





9870 


0.9790 


0.9613 


320 


320 


0.0067 


2.1455 


-0.0067 


-2.1568 


0.9775 


0. 


9835 


0.9608 


0.9603 












(c) n 


= 2p 










10 


20 


0.1067 


1.0674 


-0.0717 


-0.7171 


0.9790 





9810 


0.9993 


0.9708 


20 


40 


0.0541 


1.0811 


-0.0442 


-0.8830 


0.9753 





9858 


0.9890 


0.9708 


40 


80 


0.0290 


1.1581 


-0.0257 


-1.0272 


0.9743 





9845 


0.9830 


0.9695 


80 


160 


0.0140 


1.1161 


-0.0131 


-1.0497 


0.9763 





9850 


0.9743 


0.9658 


160 


320 


0.0071 


1.1302 


-0.0068 


-1.0883 


0.9778 





9830 


0.9703 


0.9578 


320 


640 


0.0036 


1.1549 


-0.0035 


-1.1237 


0.9758 





9833 


0.9598 


0.9608 



The population covariance matrix X = 21 which corresponds in (1.2) to a\ = 1, and t\ — 1 
for arbitrary a^. We run the estimation algorithm assuming that a\ = 1 and estimate 
a :— 02 and t := ti in (1.2). 



6.2. Robust model order estimation. We revisit the setting in Section 
5, where the population parameter vector 6 = (ai,a2,ii) = (2,1,0.5) and 
the sample covariance matrix is formed from complex-valued data. We em- 
ploy the procedure described in Section 4.5 to estimate the model order k 
(assumed unknown) and the corresponding 2k — 1-dimensional parameter 
vector 6^ . Over 1000 Monte Carlo trials, for values of p, n listed in Table 
7, we observe the robustness to model order overspecification as in Section 
6.1. Additionally, we note that as p,n(p) — > 00, the correct model order is 
estimated consistently. Table 7 demonstrates that, as before, the parame- 
ter vector is estimated with greater accuracy as the dimensionality of the 
system is increased. The parameter estimates appear to be asymptotically 
unbiased and normally distributed as before. 



Table 7 

Performance of parameter estimation algorithm when model order has to be estimated as well and S is complex 



p 


n 


Pr(fc = 1) 




a 


Pr(fe = 2) 


Si 


ft2 


ti 












(a) n = p/2 








20 


10 


0.968 


1 


4867 + 11 05 


0.032 


1 8784 ± 7384 


8785 ± 6376 


5675 ± 2650 


40 


20 


0.940 


1 


4985 + 0567 


0.060 


2 0287 ± 7244 


6929 ± 6010 


6041 ±0 3165 


80 


40 


0.700 


I 


4990 + 0974 


0.300 


2 0692 + 4968 


7604 + 4751 


5624 + 9965 


160 


80 


0.199 


1 


4998 + 0142 


0.801 


2 0199 ± 2780 


9062 ± 2841 


531 1 ± 2084 


320 


160 


0.001 


1 


4QQQ + fl 0069 


0.999 


2 0089 ± 1 398 


9763 ± 1 341 


5076 ± 1 239 


480 


240 




1 


.4999 ± 0.0046 


1 

(b) n = p 


2.0004 ±0.0967 


0.9847 ± 0.0918 


0.5076 ±0.0887 


20 


20 


0.915 


1 


.4867 ±0.0806 


0.085 


1.9229 ±0.5675 


0.6747 ±0.5748 


0.6293 ±0.2962 


40 


40 


0.736 


1 


.4987 ±0.0381 


0.264 


1.9697 ±0.3719 


0.7685 ±0.4199 


0.5920 ± 0.2644 


80 


80 


0.190 


1 


.5005 ±0.0197 


0.810 


2.0021 ±0.2273 


0.9287 ± 0.2323 


0.5310 ±0.1856 


160 


160 


0.004 


1 


.4997 ±0.0099 


0.996 


1.9908 ±0.1108 


0.9771 ± 0.0995 


0.5162 ±0.0973 


320 


320 




1 


.5000 ± 0.0048 


1 


2.0001 ± 0.0548 


0.9960 ± 0.0469 


0.5024 ± 0.0492 


480 


480 




1 


.5000 ± 0.0033 


1 

(c) n = 2p 


2.0018 ± 0.0363 


1.0002 ±0.0310 


0.4991 ±0.0327 


20 


40 


0.743 


1 


.4972 ± 0.0556 


0.257 


1.9124 ± 0.3044 


0.7835 ± 0.3756 


0.6087 ± 0.2424 


40 


80 


0.217 


1 


.5002 ±0.0286 


0.783 


1.9707 ±0.1797 


0.9361 ±0.1659 


0.5444 ±0.1458 


80 


160 




1 


.4996 ±0.0139 


1 


1.9925 ±0.0975 


0.9847 ±0.0781 


0.5116 ±0.0807 


160 


320 




1 


.4999 ±0.0071 


1 


1.9975 ±0.0485 


0.9959 ± 0.0369 


0.5034 ±0.0401 


320 


640 




1 


.5001 ±0.0035 


1 


1.9994 ±0.0232 


0.9993 ± 0.0178 


0.5008 ±0.0193 


480 


960 




1 


.4999 ± 0.0024 


1 


1.9998 ±0.0161 


0.9996 ±0.0125 


0.5003 ±0.0135 



The population covariance matrix has parameters a?, = 2, a\ — 1 and t\ = 0.5 as in (1.2). The algorithm in (4.10) with dim(ij) = 2k for 
k = 1, 2, . . . , 5 was used to estimate the model order k and the associated 2k — 1-dimensional parameter vector 6 = (ai, . . . , Ofc,ti, . . .tk-i)- 
Numerical results shown were computed over 1000 Monte Carlo trials. 
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7. Inferential aspects of spiked covariance matrix models. Consider co- 
variance matrix models whose eigenvalues are of the form Ai > A2 > • • • > 
Afc > Xk+i = • • • = A p = A. Such models arise when the signal occupies a 
/c-dimensional subspace and the noise has covariance AI. Such models are 
referred to as spiked covariance matrix models. When k <p, then for large 
p, for vg defined as in Proposition 3.1, the matrix Qg may be constructed 
from the moments of the (null) Wishart distribution [Dumitriu and Rassart 
(2003)] instead, which are given by 

(7.D tf-^jrrC-XV 

where c = p/n. Thus, for q = 2, Qg is given by 



(7.2) Q e = Q A = - 



A 2 c 2A 3 (c+l)c 
2A 3 (c+l)c 2A 4 (2c 2 +5c + 2)c 



This substitution is motivated by Bai and Silverstein's analysis 
[Bai and Silverstein (2004)] where it is shown that when k is small rela- 
tive to p, then the second-order fluctuation distribution is asymptotically 
independent of the "spikes." When the multiplicities of the spike are known 
(say 1), then we let ti = l/p and compute the moments a j accordingly. The 
estimation problem thus reduces to 

(7.3) = argmin Vg Q^vg with q = dim(ve) = dim(0) + 1, 
0e© 

where A is an element of when it is unknown. 

Consider the problem of estimating the magnitude of the spike for the 
model in (1.2) with t\ = l/p, and 02 = 1 known and a\ = 10 unknown so 
that = a = a\. We obtain the estimate 9 from (7.3) with A = 1 wherein 
the moments af given by 



(7.4a) af 
(7.4b) of 



k 

S . 



-l + a + p 



p 

a 2 p — 2pc + c — 2ac + cp 2 + p 2 — p + 2pac + a 2 c 
p 2 



are obtained by plugging in t = l/p into (5.1). 

Table 8 summarizes the estimation performance for this example. Note 
the l/p scaling of the mean squared error and how the complex case has half 
the mean squared error. The estimates produced are asymptotically normal 
as seen in Figure 5. 
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Table 8 

Algorithm performance for different values of p (dimension of observation vector) and n 
(number of samples) — both real and complex case 









Complex case 




Real case 




p 


n 


Bias 


MSE 


MSE x p 


Bias 


MSE 


MSE x p 










(a) n = p 








10 


10 


-0.5528 


9.3312 


93.3120 


-0.5612 


18.4181 


184.1808 


20 


20 


-0.2407 


4.8444 


96.8871 


-0.2005 


9.6207 


192.4143 


40 


40 


-0.1168 


2.5352 


101.4074 


-0.0427 


4.9949 


199.7965 


80 


80 


-0.0833 


1.2419 


99.3510 


-0.03662 


2.4994 


199.9565 


160 


160 


-0.0371 


0.6318 


101.0949 


0.03751 


1.2268 


196.3018 


320 


320 


-0.0125 


0.3186 


101.9388 


0.04927 


0.6420 


204.4711 










(b) n = 1.5p 








10 


15 


-0.3343 


6.6954 


66.9537 


-0.3168 


12.7099 


127.0991 


20 


30 


-0.1781 


3.2473 


64.9454 


-0.1454 


6.4439 


128.8798 


40 


60 


-0.1126 


1.6655 


66.6186 


-0.08347 


3.2470 


129.88188 


80 


120 


-0.0565 


0.8358 


66.8600 


-0.02661 


1.6381 


131.04739 


160 


240 


-0.0287 


0.4101 


65.6120 


0.02318 


0.8534 


136.5475 


320 


480 


-0.0135 


0.2083 


66.6571 


0.02168 


0.4352 


139.2527 










(c) n = 2p 








10 


20 


-0.2319 


4.9049 


49.0494 


-0.2764 


9.6992 


96.9922 


20 


40 


-0.1500 


2.5033 


50.0666 


-0.1657 


4.6752 


93.5043 


40 


80 


-0.0687 


1.2094 


48.3761 


-0.03922 


2.5300 


101.2007 


80 


160 


-0.0482 


0.6214 


49.7090 


-0.02426 


1.2252 


98.0234 


160 


320 


-0.0111 


0.3160 


50.5613 


0.01892 


0.6273 


100.3799 


320 


640 


-0.0139 


0.1580 


50.5636 


0.02748 


0.3267 


104.5465 



7.1. Limitations. Consider testing for the hypothesis that 5] = I using 
real valued observations. For the model in (1.2), which is equivalent to testing 
6 = (1, 1), from the discussion in Section 4.3, we form the test statistic 

(7.5) H Sph .:h(6)=^Q g 1 v e , 

where Q# is given by (7.2) with /3 = 1 since the observations are real valued, 
A = 1 and 

TrS 



V0 



V 



TrS 2 -p 1 + 



P 



n 



1 



P 



n 



where c = p/n, as usual. We set a threshold 7 = 5.9914 so that we accept 
the sphericity hypothesis whenever h(0) < 7. This corresponds to the 95th 
percentile of the x| distribution. Table 9(a) demonstrates how the test is 
able to accept the identity covariance hypothesis when 'Sg = I at a rate 
close to the 5% significance level it was designed for. Table 9(b) shows the 
acceptance of the sphericity hypothesis when X# = diag(10, !,...,!) instead. 
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Results were tabulated over 4000 Monte-Carlo trials. Table 10 illustrates 
the performance of the sphericity test proposed by Ledoit and Wolf (2002) 
which consists of forming the test statistic 



(7.6) LW(S) := 



np 

T 



Tr[(S-I) 2 ] 



p 


-TrS 




n 


IP 


n 



X p (p+l)/2 



and rejecting for large values above a threshold that is determined by using 
the asymptotic chi-squared approximation. Note how whenp/n is large, both 
tests erroneously accept the identity covariance hypothesis an inordinate 
number of times. The faulty inference provided by the test based on the 
methodologies developed is best understood in the context of the following 
result. 



Proposition 7.1. Let S denote a sample covariance matrix formed 
from an p x n matrix of Gaussian observations whose columns are indepen- 
dent of each other and identically distributed with mean and covariance 
XI. Denote the eigenvalues ofT, by X\ > X2 > • • • > Xk > X^+i = ■ ■ ■ = X p = X. 
Let lj denote the jth largest eigenvalue of R. Then as p,n — > 00 with c n = 



Table 9 

The identity covariance hypothesis is rejected when the test statistic in (7.5) exceeds the 
5% significance level for the \ 2 distribution with 2 degress of freedom, that is, whenever 

h{9) > 5.9914 



n = 10 n = 20 n = 40 n = 80 n = 160 n = 320 n = 640 

(a) Empirical probability of accepting the identity covariance hypothesis when Sg = I 



p = 


10 


0.9329 


0.9396 


0.9391 


0.9411 


0.9410 


0.9464 


0.9427 


p = 


20 


0.9373 


0.9414 


0.9408 


0.9448 


0.9411 


0.9475 


0.9450 


p = 


40 


0.9419 


0.9482 


0.9487 


0.9465 


0.9467 


0.9451 


0.9495 


p = 


80 


0.9448 


0.9444 


0.9497 


0.9496 


0.9476 


0.9494 


0.9510 


p = 


160 


0.9427 


0.9413 


0.9454 


0.9505 


0.9519 


0.9473 


0.9490 


p = 


320 


0.9454 


0.9468 


0.9428 


0.9451 


0.9515 


0.9499 


0.9504 



(b) Empirical probability of accepting the identity covariance hypothesis 
when Se = diag(10, !,...,!) 



p = 10 


0.0253 


0.0003 












p = 20 


0.0531 


0.0029 












p = 40 


0.1218 


0.0093 












p = 80 


0.2458 


0.0432 


0.0080 










p= 160 


0.4263 


0.1466 


0.0002 










p = 320 


0.6288 


0.3683 


0.0858 


0.0012 
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Table 10 

The identity covariance hypothesis is rejected when the Ledoit-Wolf test statistic in (7.6) 
exceeds the 5% significance level for the \ 2 distribution with p(p + l)/2 degrees of freedom 



n = 10 n = 20 n = 40 n = 80 n = 160 n = 320 n = 640 

(a) Empirical probability of accepting the hypothesis £ = I using the Ledoit-Wolf test 



V 


= 10 


0.9483 


0.9438 


0.9520 


0.9493 


0.9510 


0.9553 


0.9465 


p 


= 20 


0.9498 


0.9473 


0.9510 


0.9513 


0.9498 


0.9495 


0.9423 


p 


= 40 


0.9428 


0.9545 


0.9468 


0.9448 


0.9488 


0.9460 


0.9478 


V 


= 80 


0.9413 


0.9490 


0.9513 


0.9540 


0.9480 


0.9500 


0.9460 


V 


= 160 


0.9438 


0.9495 


0.9475 


0.9520 


0.9508 


0.9543 


0.9448 


V 


= 320 


0.9445 


0.9475 


0.9493 


0.9490 


0.9485 


0.9468 


0.9453 



(b) Empirical probability of accepting the identity covariance hypothesis 
when £ = diag(10, !,...,!) using the Ledoit-Wolf test 



V 


= 10 


0.0345 


0.0008 






p 


= 20 


0.0635 


0.0028 






V 


= 40 


0.1283 


0.0130 






V 


= 80 


0.2685 


0.0450 


0.0008 




V 


= 160 


0.4653 


0.1575 


0.0070 




p 


= 320 


0.6533 


0.3700 


0.0773 


0.0010 



p/n — > c £ (0, oo), 

(7.7) ^M 1+ AFA> + 

(Ml + ^cf, tfA 3 -<A(l + v^), 

where j = 1, . . . , k and the convergence is almost surely. 

PROOF. See Baik and Silverstein (2006), Paul (2005), Baik, Ben Arous 
and Peche (2005). □ 

Since the inference methodologies we propose in this paper exploit the 
distributional properties of traces of powers of the sample covariance matrix, 
Proposition 7.1 pinpoints the fundamental inability of the sphericity test 
proposed to reject the hypothesis XI = I whenever (for large p, n) 

\ < 1 + \[^- 
V n 

For the example considered, Ai = 10, so that the above condition is met 
whenever p/n > ct = 81. For p/n on the order of ct, the resulting inability to 
correctly reject the identity covariance hypothesis can be attributed to this 
phenomenon and the fluctuations of the largest eigenvalue. 

Canonically speaking, eigen-inference methodologies which rely on traces 
of powers of the sample covariance matrix will be unable to differentiate 
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between closely spaced population eigenvalues in high-dimensional, sample 
sized starved settings. This impacts the quality of the inference in a funda- 
mental manner that is difficult to overcome. At the same time, however, the 
results in Baik and Silverstein (2006), Paul (2005), Baik, Ben Arous and Peche 

(2005) suggest that if the practitioner has reason to believe that the pop- 
ulation eigenvalues can be split into several clusters about cij ± ai^p/n, 
then the use of the model in (1.2) with a block subspace structure, where 
the individual blocks of sizes pi,---,Pk are comparable to p, is justified. 
In such situations, the benefit of the proposed eigen-methodologies will be 
most apparent and might motivate experimental design that ensures that 
this condition is met. 

8. Extensions and lingering issues. In the development of the estima- 
tion procedures in this article, we ignored the correction term for the mean 
that appears in the real covariance matrix case (see Proposition 3.1). This 
was because Bai and Silverstein expressed it as a contour integral which 
appeared challenging to compute [see (1.6) in Bai and Silverstein (2004)]. 
It is desirable to include this extra term in the estimation procedure if it 
can be computed efficiently using symbolic techniques. The recent work 
of Anderson and Zeitouni (2006), despite its ambiguous title, represents a 
breakthrough on this and other fronts. 

Anderson and Zeitouni encode the correction term in the coefficients of a 
power series that can be directly computed from the limiting moment series 
of the sample covariance matrix [see Theorem 3.4 in Anderson and Zeitouni 

(2006) ]. Furthermore, they have expanded the range of the theory for the 
fluctuations of traces of powers of large Wishart-like sample covariance ma- 
trices, in the real sample covariance matrix case, to the situation when the 
entries are composed from a broad class of admissible non-Gaussian dis- 
tributions. In such a scenario, the correction term takes into account the 
fourth moment of the distribution [see (5) and Theorems 3.3 and 3.4 in 
Anderson and Zeitouni (2006)]. This latter development might be of use in 
some practical settings where the non-Gaussianity is well characterized. We 
have yet to translate their results into a computational recipe for deter- 
mining the correction term though we intend to do so at a later date. We 
plan to make a software implementation based on the principles outlined in 
this paper available for download. The numerical results presented show the 
consistency of the proposed estimators; it would be of interest to establish 
this analytically and identify conditions in the real covariance matrix case, 
where ignoring the correction term in the mean can severely degrade the 
quality of estimation. The issue of how a local test that exploits global in- 
formation, of the sort proposed by El Karoui (2007), compares to the global 
test developed in this article in terms of hypothesis discriminatory power 
is an unresolved question of great interest. A more systematic investigation 
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is needed of the efficacy of various model order selection techniques for the 
problem considered. 

Acknowledgments. We are grateful to Debashis Paul for his feedback on 
an early version of this paper. His comments improved the clarity of our 
exposition. The initial investigation comparing the power of local versus 
global tests was motivated by Patrick Perry's penetrating questions; only 
some of the issues he raised have been addressed in this paper while others 
remain outstanding. 

It has been a privilege to interact with Debashis Paul and Jack Silver- 
stein on these and related matters during the authors' involvement in the 
SAMSI Program on High Dimensional Inference and Random Matrices. We 
thank Iain Johnstone for the opportunity to participate in this program and 
applaud James Berger, Nicole Scott, Margaret Polinkovsky, Sue McDonald 
and Donna Pike for creating the wonderful atmosphere at SAMSI. 

We thank Boaz Nadler for detailed and helpful comments on our manuscript. 
The first author thanks Arthur Baggeroer for encouragement and feedback 
from the earliest stages of this work. We thank the referees for their excellent 
feedback and suggestions. In particular we appreciate a referee's insistence 
that we investigate the model order estimation problem further. This moti- 
vated us to streamline the numerical code so that we could run the numerical 
simulations reported in Section 6.2 more seamlessly. 

REFERENCES 

Anderson, G. W. and Zeitouni, O. (2006). A CLT for a band matrix model. Probab. 
Theory Related Fields 134 283-338. MR2222385 

Anderson, T. W. (1963). Asymptotic theory of principal component analysis. Ann. Math. 
Statist. 34 122-248. MR0145620 

Bai, Z. D. and Silverstein, J. W. (1998). No eigenvalues outside the support of the 
limiting spectral distribution of large-dimensional sample covariance matrices. Ann. 
Probab. 26 316-345. MR1617051 

Bai, Z. D. and Silverstein, J. W. (2004). CLT for linear spectral statistics of large- 
dimensional sample covariance matrices. Ann. Probab. 32 553-605. MR2040792 

Bai, Z. D. and Silverstein, J. W. (2006). Spectral Analysis of Large Dimensional Ran- 
dom Matrices. Science Press, Beijing. 

Baik, J., Ben Arous, G. and Peche, S. (2005). Phase transition of the largest eigen- 
value for nonnull complex sample covariance matrices. Ann. Probab. 33 1643-1697. 
MR2165575 

Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices 
of spiked population models. J. Multivariate Anal. 97 1382-1408. MR2279680 

Butler, R. W. and Wood, A. T. A. (2002). Laplace approximations for hypergeometric 
functions with matrix argument. Ann. Statist. 30 1155-1177. MR1926172 

Butler, R. W. and Wood, A. T. A. (2005). Laplace approximations to hypergeometric 
functions of two matrix arguments. J. Multivariate Anal. 94 1-18. MR2161210 



36 



RAO, MINGO, SPEICHER AND EDELMAN 



Collins, B., Mingo, J., Sniady, P. and Speicher, R. (2007). Second order freeness 

and fluctuations of random matrices. III. Higher order freeness and free cumulants. 

Doc. Math. 12 1-70. MR2302524 
Dey, D. K. and Srinivasan, C. (1985). Estimation of a covariance matrix under Stein's 

loss. Ann. Statist. 13 1581-1591. MR0811511 
Dumitriu, L, Edelman, A. and Shuman, G. (2007). MOPS: Multivariate orthogonal 

polynomials (symbolically). Symbolic Comput. 42 587-620. MR2325917 
Dumitriu, I. and Rassart, E. (2003). Path counting and random matrix theory. Electron. 

J. Combm. 7 R-43. MR2014530 
El Karoui, N. (2006). Spectrum estimation for large dimensional covariance matrices 

using random matrix theory. Available at http://arxiv.org/abs/math.ST/0609418. 
El Karoui, N. (2007). Tracy- Widom limit for the largest eigenvalue of a large class of 

complex sample covariance matrices. Ann. Probab. 35 663-714. MR2308592 
Haff, L. R. (1980). Empirical Bayes estimation of the multivariate normal covariance 

matrix. Ann. Statist. 8 586-597. MR0568722 
Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal com- 
ponents analysis. Ann. Statist. 29 295-327. MR1863961 
Ledoit, O. and Wolf, M. (2002). Some hypothesis tests for the covariance matrix 

when the dimension is large compared to the sample size. Ann. Statist. 30 1081-1102. 

MR1926169 

Mingo, J. A. and Speicher, R. (2006). Second order freeness and fluctuations of random 
matrices. I. Gaussian and Wishart matrices and cyclic Fock spaces. J. Fund. Anal. 235 
226-270. MR2216446 

Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley, New York. 
MR0652932 

Nadakuditi, R. R. (2007). Applied stochastic eigen-analysis. Ph.D. dissertation, Mas- 
sachusetts Institute of Technology, Dept. Electrical Engineering and Computer Science. 

NlCA, A. and Speicher, R. (2006). Lectures on the Combinatorics of Free Probabil- 
ity. London Mathematical Society Lecture Note Series 335. Cambridge Univ. Press. 
MR2266879 

Paul, D. (2005). Asymptotics of sample eigenstructure for a large dimensional spiked 

covariance model. Statist. Sinica 17 1617-1642. MR2399865 
RAO, N. R. (2006). RMTool: A random matrix calculator in MATLAB. Available online 

at http:/ /www.mit.edu/~raj/rmtool. 
Rao, N. R. and Edelman, A. (2006). Free probability, sample covariance matrices and 

signal processing. In Proceedings of ICASSP 5 V-1001-V-1004. 
Silverstein, J. W. and Combettes, J. W. (1992). Signal detection via spectral theory 

of large dimensional random matrices. IEEE Trans. Signal Process. 40 2100-2105. No. 

8. 

Smith, S. T. (2005). Covariance, subspace, and intrinsic Cramer-Rao bounds. IEEE 
Trans. Signal Process. 53 1610-1630. No. 5. MR2148599 

Srivastava, M. S. (2005). Some tests concerning the covariance matrix in high- 
dimensional data. J. Japan Statist. Soc. 35 251-272. MR2328427 

Srivastava, M. S. (2006). Some tests criteria for the covariance matrix with fewer obser- 
vations than the dimension. Acta Comment. Univ. Tartu. Math. 10 77-93. MR2309747 

Srivastava, M. S. (2007). Multivariate theory for analyzing high-dimensional data. J. 
Japan Statis. Soc. 37 53-86. MR2392485 

Tracy, C. and Widom, H. (1994). Level-spacing distributions and the Airy kernel. Com- 
mun. Math. Phys. 159 151-174. MR1257246 



STATISTICAL EIGEN-INFERENCE 



37 



Tracy, C. A. and Widom, H. (1996). On orthogonal and symplectic matrix ensembles. 
Commun. Math. Phys. 177 727-754. MR1385083 

Van Trees, H. L. (2002). Detection, Estimation, and Modulation Theory. Part IV. Op- 
timum Array Processing. Wiley, New York. 

Wishart, J. (1928). The generalized product moment distribution in samples from a 
normal multivariate population. Biometmka 20 32-52. 

N. R. Rao J. Mingo 

Department of Mathematics and EECS R. Speioher 

Massachusetts Institute of Technology Department of Mathematics 

Cambridge, Massachusetts 02142 and Statistics 

USA Queen's University 

E-mail: raj@mit.cdu Kingston, Ontario 

Canada K7L 3N6 
E-MAIL: mingo@mast.quccnsu.ca 
spcichcr@mast.quecnsu.ca 

A. Edelman 

Department of Mathematics 
Massachusetts Institute of Technology 
Cambridge, Massachusetts 02139 
USA 

E-MAIL: edelman@mit.cdu 



